A Heterogeneous Naive-Bayesian Classifier for Relational Databases
نویسندگان
چکیده
© A Heterogeneous Naive-Bayesian Classifier for Relational Databases Geetha Manjunath, M Narasimha Murty, Dinkar Sitaram HP Laboratories HPL-2009-225 Relational databases, Classification, Data Mining, RDF Most enterprise data is distributed in multiple relational databases with expert-designed schema. Application of single-table data mining techniques to distributed relational data not only incurs a computational penalty for converting to a "at" form (mega-join), even the human-specified semantic information present in the relations/schema is lost. Purely relational classification algorithms on the other hand, do consider detailed relationships between attributes. However, these techniques either require computationally intensive transformations or multiple analysis of fused datasets, which becomes infeasible in practical scenarios. Classification being one of the most popular predictive data mining tasks, we need practical algorithms that can be directly applied on existing databases. We present such a practical two-phase classification algorithm for relational databases with a semantic divide and conquer approach. We propose and prove a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual tables. Our approach also attempts to effectively leverage the semantic knowledge of the application that is hidden in the database schema using the Join Graph of an application. To automate the classification process, RDF (the core Semantic Web data model) is used for problem specification. A preliminary evaluation over TPCH and UCI benchmarks shows reduced training time in automated practical scenarios, without any loss of prediction accuracy. In fact, we show improved accuracy due to application of heterogeneous classifiers on individual tables by comparing it to other state-of-art techniques. External Posting Date: September 6, 2009 [Fulltext] Approved for External Publication Internal Posting Date: September 6, 2009 [Fulltext] Copyright 2009 Hewlett-Packard Development Company, L.P. A Heterogeneous Naive-Bayesian Classifier for Relational Databases
منابع مشابه
Simple Estimators for Relational Bayesian Classifiers
In this paper we present the Relational Bayesian Classifier (RBC), a modification of the Simple Bayesian Classifier (SBC) for relational data. There exist several Bayesian classifiers that learn predictive models of relational data, but each uses a different estimation technique for modeling heterogeneous sets of attribute values. The effects of data characteristics on estimation have not been ...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملAn Efficient Multi-relational Naïve Bayesian Classifier Based on Semantic Relationship Graph
Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into...
متن کامل